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ChlP-sequencIng is a method of choice to localize the positions of protein binding sites on DNA on a whole genomic 
scale. The deciphering of the sequencing data produced by this novel technique is challenging and it is achieved by their 
rigorous interpretation using dedicated tools and adapted visualization programs. Here, we present a bioinformatics tool 
(D-peaks) that adds several possibilities (including user-friendliness, high-quality, relative position with respect to the 
genomic features) to the well-known visualization browsers or databases already existing. D-peaks is directly available 
through its web interface http://rsat.ulb.ac.be/dpeaks/ as well as a command line tool. 



These very last years, researchers have been challenged by the 
development of novel techniques derived from high-throughput 
sequencing (e.g., genome, RNA or exome sequencing and epi- 
genetic sequencing approaches). Among those, a very popular 
approach is ChlP-sequencing (ChlP-seq), which is currently 
widely used to analyze protein interactions (e.g., transcription 
factors and chromatin modifying enzymes) with DNA. ChlP- 
seq replaces now ChlP-chip as the method of choice allowing the 
exhaustive discovery of precise global DNA binding sites for a 
protein of interest.' 

Briefly, ChlP-seq consists in chemically cross-linking DNA to 
proteins (among which is the protein of interest) to DNA with a 
chemical agent, then fragmenting the DNA into pieces of about 
50 to 500 bp. The DNA pieces linked to the protein of interest 
are then immunoprecipitated using an antibody directed against 
this protein. Finally, the DNA pieces, enriched in the binding 
sites of the protein of interest, are sequenced (Fig. 1). 

The following step is performed in silico. Indeed, it consists in 
identifying the portions of the genome that are enriched in short 
sequenced fragments (short reads), which are potential binding 
sites for the studied proteins. These regions are named peaks as 
they correspond to areas of the genome that are highly covered by 
the sequencing. This step, called peak-calling, is a big challenge 
in current bioinformatics as illustrated by the impressive num- 
ber of tools dedicated to this task (for reviews see refs. 2 and 3). 
These tools not only specify where the peaks are located but also 
generally export the results by attributing a score to each position 
of the genome. This score generally corresponds to the number 
of sequenced reads (i.e., sequenced reads enrichment level) of a 
given genomic position. When high scores are plotted vs. the 
genomic position they can appear as peaks (Fig. 3). 

Plotting the ChlP-seq peaks can be achieved using numerous 
bioinformatics libraries, tools like Seqminer'' or browsers such 
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as EnsEMBL, UCSC or Igv.^'^ However, even if the possibilities 
offered by these well-established tools are numerous and impres- 
sive (mainly for their exhaustivity and flexibility), limitations do 
appear when the displayed figures are exported. Those figures 
are sometimes not very aesthetic and may not fulfill the classical 
criteria for a publication or a presentation. Moreover, the software 
mentioned above are generally so powerful and complete that it 
may seem difficult for the non-expert user to find the exact com- 
bination of options and manipulations that should be chosen to 
get a simple image of his/her ChlP-seq results that is comparable 
to the standards of the field. Finally, to our knowledge, none of 
these tools allow the user to display the chromosome coordinates 
from a given point (i.e., relative coordinates) allowing the user to 
view the distance between a peak and a feature of interest (e.g., 
transcription start or end site). Aware of these limitations, we 
developed D-peaks (draw-peaks), an user-friendly tool (with a 
few simple options) able to render several tracks of continuous 
values along the genes and the genome in high quality pictures 
and to specify coordinates relative to any position on the chro- 
mosome (Fig. 2). Its main advantages over the popular genome 
browsers mentioned above are its user-friendliness, the titles and 
labels easy customization, the possibility to display genomic rela- 
tive coordinates and the high quality of the resulting figures. 

As mentioned above, together with the position of potential 
binding sites, continuous values scoring files (WIGGLE or WIG 
files) are generally generated by the peak-calling programs. Up 
to five (compressed or not) WIG files of total size smaller than 
200 Mb (no restriction with the command line tool) can eas- 
ily be submitted to D-peaks. As it is the case for some online 
browsers, the files remain stored on the server and thus must 
not be re-uploaded for each picture using the same data. Some 
options, such as the absolute (and the relative) genomic position, 
the DNA strand and some other aesthetic possibilities (labels and 
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GGCGGTTTGCTAGTCAATGATACCTAACCCAGATGG 
AGGTGGGACTGGTCCTGGGAAATGTATCATGTGTCC 
AACTCAACGTTAGACAACGCAATATCTAACACAACC of 
GACCCAAATAATCTTCCCAGTTACGGTTGCCTTCCA short reads 
CCGTGATTTCGGTAATGAGGAACATGTCTTTTGGTG 
GTCGATAGTTCTTTACCTGCCATTCACTACCAGGCA 



Figure 1. ChlP-seq principle. Classically, DNA and DNA binding proteins are cross-linked with formaldehyde in cell extracts. DNA is then extracted and 
sonicated into small pieces and these fragments, which are linked to the protein of interest are immunoprecipitated using a specific antibody. The 
cross-link reaction is then reversed and the precipitated DNA fragments are sequenced. The output of the sequencer consists in millions of short reads 
that should correspond to the DNA binding sites of the protein of interest. 
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Figure 2. D-peaks main screen. Users are invited to submit the WIG files and to choose between various simple parameter possibilities to ensure an 
optimal rendering of their ChlP-seq data. 



scale of the axes, colors of the peaks, etc.) must then be specified 
to obtain the resulting figure. For example. Figure 3 displays a 
typical output of D-peaks where the binding sites of different 
transcription factors are visualized in the vicinity of the Leftyl 
gene.** Depending, on the number and size of uploaded scoring 
files, the figure is produced in seconds or minutes. Currently, the 
online version of D-peaks works with human, mouse, zebrafish 
and Drosophila assemblies, but other assemblies, genomes or fea- 
tures could easily be added in the future or on simple request. A 
guide, as well as a pre-filled demonstration form, is available from 
the main site helping the new user to use our tool online but also 
in command line. 

The parsing of the genomic and the scoring files are computed 
using the Perl programming language, which in turns uses the 
R statistical environment' to draw the requested figures. The 
website consists in a simple PHP layer on top of the Perl script. R 
and Perl must thus be installed for D-peaks to work programmat- 
ically (i.e., in command line mode). This programming strategy 
allows D-peaks to be easily portable, as it does not depend on the 
web interface. However, when changing any input parameter, the 
resulting figure must be each time globally recomputed. 

With the apparition of new techniques based on high-through- 
put sequencing, ChlP-seq has recently become one of the most 
successful genomics technique able to detect the localization of 
the DNA binding sites of a protein. However, even if several pow- 
erful, exhaustive but sometimes complicated browsers allow the 



visualization of these results, a nice and high quality rendering 
of these data is, to our knowledge, not easily achieved. We thus 
developed D-peaks, a ChlP-seq result analysis tool which draws a 
precise representation of several ChlP-seq experiments along the 
genomes in a few very simple steps. We are convinced that this 
tool may be of high interest to any scientist working and publish- 
ing in the ChlP-seq field, as indicated by some threads going in 
that direction on online specialized forums and as this type of 
representation has become a standard of the field. 
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Figure 3. D-peaks results. This figure is a partial reproduction of a figure produced in Whyte et al (2012) where the authors showed that Sox2, Nanog 
and Oct4 (and other not shown factors) bind in the same regions that the histone demethylase LSD1 in the promoter of the Leftyl gene (here).' Note 
that the relative position of the peaks compared with the Leftyl transcription start site is clearly visible with our tool. Color code of the gene: Red, UTR; 
blue, introns; green, exons. 
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